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Abstract 

We study a nonlinear regression problem of fitting a circle (or a 
circular arc) to scattered data. We prove that under any standard 
assumptions on the statistical distribution of errors that are commonly 
adopted in the literature, the estimates of the circle center and radius 
have infinite moments. We also discuss methodological implications 
of this fact. 

Keywords: orthogonal regression, errors- in- variables, least squares fit, circle 

fitting, moments of estimates. 

1 Introduction 

Regression models in which all variables are subject to errors are known 
as error-in-variables (EIV) models. The EIV regression problem is quite 
different (and far more difficult) than the classical regression where the in- 
dependent variable is assumed to be error-free. The EIV regression, even in 
the linear case, presents extremely challenging questions and leads to some 
counterintuitive results (some of them are mentioned below). 

This work is devoted to a nonlinear EIV model where one fits a circle 
to scattered data. This is one of the basic tasks in pattern recognition and 
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computer vision. The need of fitting circles to planar images also arises in 
biology and medicine, nuclear physics, archeology, industry, and other areas 
of human practice. 

The most popular method used to solve this problem is orthogonal least 
squares, i.e. the minimization of the sum of squares of the distances from the 
data points to the fitting contour. This method is often called geometric fit 
or orthogonal distance regression (ODR). 

Fitting a circle to observed points (xi, yi), . . . , (x n , y n ) amounts to mini- 
mizing the objective function 

n 

(1) F(a, b,R) = J2 W(*i - a ? + iVi - b) 2 - R] \ 

i=i 

where (a, b) denotes the center and R the radius of the circle. Then the 
parameters of the best fitting circle are defined by 

(2) (a,b,R)= argmin J 7 (a, b, R). 

To explore the statistical properties of the estimates a, b, R one needs to 
make assumptions on the probability distribution of the data points. It is 
commonly assumed that each (xi, y^) is a noisy observation of some true point 

(3) Xi = x* + S u yi = y* + e h i = 1, . . . , n, 

where (5i,ei), . . . , (S n ,e n ) are n independent random vectors, usually with 
zero mean. 

A standard assumption is that each (5j,£j) is a normal (Gaussian) vector 
with some covariance matrix Cj. The simplest choice is Cj = cr 2 I, in which 
case all errors e^'s and <5j's are i.i.d. normal random variables with zero mean 
and a common variance a 2 . In that case the geometric fit ([2]) coincides with 
the maximum likelihood estimate (MLE), see Chan 1965. 

The true points (x*,y*) are supposed to lie on a 'true circle', i.e. satisfy 

(4) (x? - a*) 2 + (y* - b*f = (R*) 2 , i = 1, ■ ■ ■ , n, 
where (a*,b*,R*) denote the 'true' (unknown) parameters. Therefore 

x { = a + R cos ifii, y% = b + R sm^, 
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where (pi, . . . , <p n specify the location of the true points on the true circle. 

The angles ipi, . . . , <p n can be regarded as fixed unknowns, then they have 
to be treated as additional parameters of the model (often called incidental 
or latent parameters). This setup is known as a functional model, see Chan 
1965. 

Alternatively, (fi, . . . ,(p n can be regarded as independent realizations of 
a random variable with a certain probability distribution on [0, 2tc]; then one 
gets the so called structural model, see Anderson 1981 or Berman and Culpin 
1986. Both models are widely used in the literature. 

Many authors study the distribution of the estimates a, b, R under the 
above assumptions and try to evaluate their biases and covariance matrix. 
Our main result is 

Theorem 1. If the probability distribution of each vector (5i,Ei) has a con- 
tinuous strictly positive density, then a,b,R do not have moments, i.e. 

E(\a\) = E(\b\) = E(R) = oo. 

Thus the estimates a, b, R have no mean values or variances. 

Our assumptions include (but are not limited to) normally distributed 
errors. The distribution of (5j,£j) need not be the same for different z's, it 
may depend on i, but the vectors (5i,Si) must be independent. The mean 
value of (5i, Ei) need not be zero. The theorem is valid for every n > 3. 

2 Historical remarks 

Our result is not entirely surprising as a similar theorem has been proven 
for orthogonal least squares lines by Anderson 1976. Suppose one fits a line 
y = a + (3x to data points {x\,yi), . . . , (x n , y n ) by minimizing the sum of 
squares of (orthogonal) distances, i.e. the estimates are defined by 

1 n 

(a, (3) = argmin ^{Vi -a- f3xi) 2 . 

+ P i=i 

Again the observed points are random perturbations of some true points, in 
the sense of ()3]), which lie on an unknown true line, i.e. satisfy 

(5) y* = a * + (3*x*, i = l,...,n. 
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The true points are either fixed parameters (making it a functional model), 
or randomly sampled on the true line (structural model). 



Theorem 2 (Anderson 1976). If the errors S^'s and e^s are i.i.d. normal 
random variables with zero mean and a common variance a 2 > 0, then a 



Until Anderson's discovery, statisticians were used to employ Taylor ex- 
pansion to derive some 'approximate' formulas for the moments of the es- 
timates a and (3 (including their means and variances). Anderson demon- 
strated that all those formulas should be regarded as moments of some ap- 
proximations, rather than 'approximate moments'. 

Anderson's result was rather sensational at the time, it was followed by 
heated discussions and a period of acute interest in the linear EIV regression. 
It also created methodological problems which we discuss in the next section. 

Anderson proved his theorem by using an explicit formula for the density 
function of (3 (that formula was mentioned but not given in his paper; it ap- 
peared in a later paper by Anderson and Sawa 1982. Anderson also remarked 
that his result can be 'intuitively seen' from a well known formula for (3: 



where standard statistical notation are used: s xx = Y^i=i( x i ~ x ) 2 ' s yy = 



Er=i(^-2/) 2 ' s xy = Y%=i( x i- x )(Vi-y), anda; = ± £" =1 x h y = £££=i3/i- 



Anderson 1976 says that the (continuous) density of s xx , s yy , and s xy for 
which the numerator in ([6]) is different from and the denominator is equal 
to is positive, hence the integral of the product of b and its density diverges. 

This 'intuitive' explanation can be easily converted into a rigorous proof, 
and then one readily extends Anderson's theorem to arbitrary distributions 
of errors as long as they have continuous strictly positive densities, like in our 
Theorem 1. (Alternatively, one can easily modify our constructions below to 
achieve this goal; this is all fairly straightforward, so we omit details.) 

We note that Anderson's result was recently extended to some other es- 
timates of the linear parameters a and (3, see Chen and Kukush 2006 and an 
example in Zelniker and Clarkson 2006. 

The problem of fitting circles (as well as other nonlinear curves) to data 
is technically much more difficult than that of fitting lines. In particular, 



and (3 do not have moments, i.e. E(\6t\) = E(\f3\) = oo. 



(6) 




2.s 
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there are no explicit formulas for the estimates a, b or R, analogous to 

let alone explicit formulas for their probability densities. All the methods of 

computing the estimates a, b or R are based on iterative numerical schemes. 

All this makes the problem of fitting circles (ellipses, etc.) so much dif- 
ferent from that of fitting lines that Anderson's result apparently passed 
unnoticed by the 'curve fitting community'. It is still commonly believed 
that the curve's parameters have finite moments; thus many researchers try 
to minimize their bias and variances or compute Cramer-Rao lower bounds 
on the covariance matrix, see e.g. Kanatani 1998 or Chernov and Lesort 2004. 

Our theorem shows that the true moments are infinite in the case of fitting 
circles. We believe this result holds for ellipses and other types of curves, 
and we plan to investigate this issue. 

We note that Kukush et al. 2004 recently modified the geometric fitting of 
ellipses to data in order to ensure the consistency of the parameter estimates 
(as the orthogonal regression estimates are inconsistent). They noted that 
their modified estimates had infinite moments, which at the time seemed to 
be a price to pay for consistency. It is now clear that the lack of moments is 
a rather general property of estimates under the EIV model. 

3 Methodological issues 

When Anderson proved his Theorem 2, it immediately lead to fundamental 
methodological questions: can one trust a statistical estimate that has an in- 
finite mean square error (not to mention infinite bias)? Can such an estimate 
be better than others which have finite moments? 

Fitting lines. To explore this issue, Anderson 1976 and 1984, Kunitomo 
1980, and others compared the MLE estimate (3 given by with the classical 
estimate (3 = s xy / s xx of the slope of the regression line that is known to be 
optimal when x^s are error-free (i.e., Si = 0). They denote the former by 
(3m (Maximum likelihood) and the latter by /3l (Least squares); of course, 
both estimates were studied in the framework of the EIV model described 
in Section [2J Their results can be summarized in two seemingly conflicting 
verdicts: 

(a) The mean square error of Pm is infinite, and that of (3^ is finite (when- 
ever n > 4), thus (3i appears (infinitely!) more accurate; 
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(b) The estimate [3m is consistent and asymptotically unbiased, while (3^ is 
inconsistent and asymptotically biased (unless (3 = 0). 

Besides, Anderson 1976 shows that if (3 ^ 0, then 

P{0u-(3\>t) <P(0 L -(3\>t) 

for all t > of practical interest, i.e. the accuracy of (3m dominates that of (3^ 
everywhere, except for very large deviations (large t). It is the heavy tails of 
13m that make its mean square error infinite, otherwise it tends to be closer 
to (3 than its rival (3-^. 

Anderson 1976 remarks that this situation, in its extreme, resembles the 
following dilemma: suppose we are estimating a parameter 9 whose true value 
is 9* 0, and we have to choose between two estimates: one, 9i, has Cauchy 
distribution, and the other, 9 2 , has a normal distribution with mean 100 and 
variance 1. Would anyone prefer # 2 only because it has finite moments? 

Thus Anderson and others build a very strong case supporting the MLE 
estimate (3m, despite its infinite moments. Furthermore, Gleser 1983 proves 
that the MLE estimate (3m is the best possible in a certain formal sense, we 
refer the reader to Chen and Van Ness 1994 for a detailed survey. 

Fitting circles. Now we return to the circle fitting problem. It allows 
an alternative approach: instead of minimizing geometric distances ([I])-([2]) 
one can minimize the so-called 'algebraic distances': 

n 

(7) (a , b , R ) = argmin ^ [(x* - a) 2 + (y { - bf - R 2 ] . 

i=i 

By changing parameters A = —2a, B = —26, and C = a 2 + b 2 — R 2 one 
reduces ([7]) to a linear least squares problem 

n 

(8) (a , b , R ) = argmin ^ [ x i + vl + Ax i + B Vi + C] 

i=i 

which has a unique and explicit solution. This approach is known as a 
simple algebraic fit (see Chernov and Lesort 2005) or Delogne-Kasa method 
(Zelniker and Clarkson 2006); it was introduced in the 1970s. It has an 
obvious advantage of simplicity over the geometric fit, which requires iterative 
numerical schemes. 
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The competition between the geometric and algebraic circle fits is now 
over 30 years old, and so far it was focused on simplicity versus accuracy. 
Geometric estimates (a, b, R) are widely known to be extremely accurate 
in practical applications, despite their slight tendency to overestimate the 
circle's radius (the latter was pointed out by Berman 1989). On the other 
hand, the Delogne-Kasa estimates are heavily biased toward smaller circles, 
see Chernov and Lesort 2004 and 2005 and references therein, and generally 
much less accurate than the geometric estimates. 

Now this competition acquires a new, purely statistical momentum. Re- 
cently Zelniker and Clarkson 2006 proved that the Delogne-Kasa estimates 
(So, bo, Rq) have finite mean values whenever n > 3 and finite variances when- 
ever n > 4. Our work shows that the geometric estimates (a, b, R) have 
infinite moments. 

This competition very much resembles the one described above between 
the two line slope estimates: the MLE (3 M and the 'classical least squares' 
/3 L - It would be interesting to further compare the two circle fits along the 
lines of the cited works by Anderson, Kunitomo, Gleser, and others, but this 
is perhaps a research program for distant future. 

Alternative parametrizations. One can also say that non-existence of 
moments is an artifact of a poorly chosen parametrization, and the problem is 
easily remedied by changing parameters. In the case of lines, one can replace 
its slope P with the angle 6 the line makes with, say, the y-axis. Then the 
line can be described as xcos9 + ysinO + d = 0. Now it is easy to check 
that the estimates of 9 and d have finite moments. These parameters are 
commonly used after Anderson's work in 1976. 

In the case of fitting circles, the radius R can be replaced with the curva- 
ture p = 1/R. It is easy to check that the estimate of p has finite moments 
(up to the order In — 3). The center coordinates (a, b) can be replaced by, say, 
c = a/R and d = b/R, which would also have finite moments. Alternatively 
one can replace them with (q, 6) defined by a = g _1 cos^ and b = g^sin^ 
(V. Clakson, private communication). All these new parameters have finite 
moments. 

Alternatively, one can describe circles by equation 

A(x 2 + y 2 ) + Bx + Cy + D = 

subject to constraint B 2 + C 2 — AAD = 1; this was proposed by Pratt 1987. 
Now the parameters (A, B, C, D) are defined uniquely; and using the results 
of Chernov and Lesort 2005 it is easy to check that they have finite moments. 
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4 Proof of Theorem 1 



It is enough to prove our theorem for the functional model. Indeed, then 
in the context of the structural model the conditional expectations of \a\, 
\b\, and R for every given realization of <pi, . . . ,<p n will be infinite, thus their 
unconditional expectations will be infinite, too. 

Next we need to make a few general remarks. First, the objective function 
(PQ) may not have a minimum. For example, if the data points are collinear, 
then inf T{a,b, R) = 0, but there is no circle that would interpolate n > 2 
distinct collinear points, hence J 7 (a, b, R) > for all a, b, R. In that case the 
best fit is achieved by a line, which can be regarded as a 'degenerate circular 
arc with infinite radius'. 

It is proved in Chernov and Lesort 2005 that if one poses the circle fit- 
ting problem in this 'extended sense', i.e. as finding a circle or a line which 
minimizes the sum of squares of distances to the given data points, then the 
problem always has a solution. That is, the best fitting contour (a circle or 
a line) always exists. The solution may not be unique, though, as the global 
minimum of the objective function (JTJ) can be attained simultaneously on 
several distinct circles, examples are given in Chernov and Lesort 2005 and 
Zelniker and Clarkson 2006. 

In the case of multiple solutions, any one can be selected, our theorem 
remains valid for any selection. If the best fit is a line, rather than a circle 
(for example, if the data are collinear), then we can set a = b = R = oo. 

This fact by itself does not prove our theorem, of course, as the proba- 
bility of such an exceptional event is zero. It shows, however, that in nearly 
collinear cases the estimates a, b, R tend to take arbitrarily large values, and 
we will explore this tendency thoroughly. 

Simple case n — 3. Our argument is particularly simple if n = 3, and 
this case also illustrates our main idea. 

Let 3 data points be located at (0, 0), (0, —1) and (x, 1 + y) where x and y 
are small, say max{|x|, < h — 10 -9 . Note that for n = 3 the best fitting 
circle simply interpolates the three given points, so by elementary geometry 
a = (2 + 3y + y 2 + 2x 2 )/(4:x), in particular \a\ > 1/ (3|x|). Since the density 
of (£3,2/3) = (x, 1 + y) is continuous and positive, it has a minimum value 
Po > in the rectangle \xs\ < h, 1 2/3 — 1 1 < h. Therefore the conditional 
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expectation of \a\, when the other two points are fixed, is 

E(\a\/B)>po / — -dxdy = 00, 

Jl-h J-h 6 \ x \ 

where B = {{x 1 ,y 1 ) = (0,0), (x 2 ,y 2 ) = (0,-1)}. 

A similar estimate holds if the points (0, 0) and (0, —1) are perturbed 
slightly, say within a little square of size h 2 around their initial positions. 
Now the densities of (xi,yi) and (x2,y 2 ) are also positive, so a direct inte- 
gration yields -E(|a|) = 00. It is also clear that E(R) = 00. Rotating our 
construction, say by n/2, we obtain E(\b\) = 00, too. 

General case n > 3. We modify our previous construction as follows. 
Let h = 10~ 9 n -2 (here 10~ 9 may be replaced with any sufficiently small 
constant). 

We place our first point (xi,yi) in the 'lower' square [— h 2 , h 2 } x [—1 — 
h 2 , — 1 + h 2 }, then n — 2 points % — 2, . . . ,n— 1, in the 'central' square 

[— h 2 , h 2 } x [— h 2 , h 2 ], and the last point (x n , y n ) in the (horizontally extended) 
'upper' rectangle [—h, h] x [1 — h 2 , 1 + h 2 ]. 

For every fixed positions of the first n—\ points and the fixed y-coordinate 
y n of the last point, we will examine how the best fitting circle changes as 
the x-coordinate x = x n of the last point changes from —h to h. Let a(x) 
denote the first coordinate of the circle's center (we suppress its dependence 
on the other Xi and yi coordinates). If the best fit is a line (and that line is 
clearly almost vertical), we set a = 00. Since a is large, it is more convenient 
to work with ((x) = l/a(x), which is always finite and small. 

Observe that all our points (xi,yi), 1 < % < n, are located in the h 2 - 
vicinity of three points: (0, 0), (0, —1), and (x, 1), thus the best fitting circle 
(or line) passes through the /i 2 -vicinity of these three points, too. By elemen- 
tary geometry, if x — h, then a(x) > l/(2/i), hence ((h) G (0, 2h). Similarly, 
a(-h) < -l/(2/i), hence ((-h) E (-2h,0). 

As x = x n changes from — h to h, the ((x) function moves from the neg- 
ative interval (— 2h,0) into the positive interval (0,2h), and it stays between 
— 2h and 2h. All we need now is that ((x) behave regularly in the following 
sense: 

Lemma 3 (Regularity). For any fixed values (xi,yi), 1 < i < n — 1, andy n , 
as above, the function ((x) is differentiable and its derivative is bounded, i.e. 
\C'{ X )\ ^ D f or some constant D > 0. Here D may depend on n and h but 
not on the fixed coordinates (xi,yi). 
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The proof of Lemma is rather technical; it is given in Appendix. 

Proof of Theorem 1. Due to the regularity lemma, the function ((x) is 
continuous, hence ((xo) = for some x G ( — h, h). The boundedness of the 
derivative ('(x) implies that for any e > if \x — xo| < e, then \C(x)\ < De, 
hence \a(x)\ > l/{De). The conditional probability of this event (when 
(xi,yi) for i = 1, ... ,n — 1 and y n are fixed) is > p e with some constant 
p > 0, due to the positivity of the density of (x n , y n ). Therefore again, as in 
the n = 3 case, the conditional expectation of \a\ is infinite, hence so is the 
unconditional expectation due to the positivity of the densities of (x n ,y n ), 
i — 1, . . . , n — 1. 

Our analysis also implies E(R) = oo. Rotating our construction by n/2 
gives = oo. □ 

Radial model. Our theorem can be extended to another interesting 
model for the circle fitting problem proposed by Berman and Culpin 1986 
and further studied by Chernov and Lesort 2004. In this model the error 
vector ej = (5i, cf. (j3J), satisfies = ^rij, where rij is a unit normal 
vector to the true circle at the true point (x*,y*) and £j's are independent 
normally distributed random variables with zero mean. In other words, the 
noise (Si, e$) is normal but restricted to the radial direction (perpendicular 
to the circle). 

To extend our theorem to this model we need to assume that there are at 
least three distinct true points (x*, y*) on the true circumference. We outline 
the modifications in our argument needed to cover this new case. 

Clearly it is possible that all the data points are collinear, i.e. there is a 
line L such that the probability that all the data points lie in the /t-vicinity 
of L is positive for any h > 0. Also, for at least one data point its radial 
direction (on which its distribution is concentrated) must be transversal to 
L. Let that point be (x n , y n ). Now we can repeat our construction by moving 
(x n ,y n ) across L and keeping all the other points fixed in a tiny vicinity of 
L. The technical analysis only requires minor modifications in this new case, 
so we omit details. 

Acknowledgement. The author is partially supported by NSF grant DMS- 
0652896. 
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Appendix 

Here we prove our regularity Lemma 3. First we eliminate R from the picture. 
The objective function (fTj) is a quadratic polynomial in R, hence it has a 
unique global minimum in R when the other two variables a and b are kept 
fixed, and it is attained at 

i n i 

(9) R = R(a, b) = -J2 ~ a ) 2 + S " h ?- 

i=l 

This allows us to express T as a function of a and b only: 

n 

T{a, b)=Y, W^-af + ivi-bf - R(a, b)] 2 

i=l 

(10) = n [z - 2ax - 2by + a 2 + b 2 ] - n[R{a, b)] 2 , 



where for brevity we denote Zi = x 2 + y 2 ; here and on we use standard 
statistical 'sample mean' notation z = - z i, % — - 'Yli x ii etc. 

Next we switch to polar coordinates a = pcos9 and b = psm9 in which 
( fTUl) takes form 

I jr(p ; B) = z- 2p(x cos6 + y sin 6) + p 2 

(11) - [i v 7 ^ - 2 p( x i cos o + Vi sin 0) + p 2 ] 2 ■ 

Note that T in (fT0|) and (ITTj) denotes the same function, though expressed 
in different sets of variables. We introduce more convenient notation 

Ui = Xi cos 9 + yi sin 6 and Vi = — x« sin 9 + yi cos 6 

(observe that uf + v 2 = z { ), so that (ITT]) becomes shorter: 

(12) ±F(p, 9) = z - 2pu + p 2 - [I V^-2P^ + P 2 
Now we introduce another variable 

Wi= p [\/z~i - 2pui + p 2 - (p - u^] 
v 2 

(13) 



a/1 - 2UiP~ l + Zjp- 2 + 1 - MjP" 1 
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From (TT5|) we have y Zi — 2pUi + p 2 = p — U{ + wip 1 , hence 

i 2J - 2 P u i + P 2 = P~u + wp~ x . 
Now f[T2"j) takes form 

(14) 0) = z - m 2 - 2w + 2uwp' 1 - w 2 p~ 2 . 

By elementary geometry, the (averaged) objective function takes all its 
small values (say, all values less than h 2 /10) on circles and lines that pass 
in the /i-vicinity of the three basic points: (0,0), (0,-1) and (0,1). These 
circles and lines have parameters restricted to the region where p > l/(100/i) 
and I sin6>| < lOO/i. 

We replace the large parameter p with its reciprocal 5 = p~ x and obtain 

(15) 6) = z-u 2 -2w + 2uw5 - w 2 6 2 , 
where 



y/1 — 2UiS + Zi5 2 + 1 — UiS 

(recall that Wj's and i>j's depend on 6 but not on p). 

Observe that the transformation 9 1— > 9 + ir and 5 1— > —5 leaves w^'s and 
JF(5, 6 1 ) unchanged; thus we can let 5 take (small) negative values but keep 9 
close to 0. More precisely, we can restrict our analysis to the region 

(16) Q = {\5\<100h and \6\ < 100/t}. 

Now one can easily see that the function J 7 (5, 9) in Q is regular in the fol- 
lowing sense: it is continuous and has bounded first and second derivatives 
(including partial derivatives) with respect to its variables 5 and 9 and with 
respect to x = x n . We denote the first derivatives by J-'s, J~"e, Fx and second 
derivatives by JF^, etc. 

All these derivatives are uniformly bounded by a constant M > that 
may depend on n and h but not on the other point coordinates. 

By direct differentiation of J- '(5, 9) we see that 

(17) jr ss = i-l +Xli jf^ = 4 - £ + X2 , Fse = X3, 

where %i are various small quantities (that can be made as small as we please 
by further decreasing h). Thus, T is a convex function that has exactly one 
minimum in Q and no other critical points. 
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Let (S, 9) denote that unique minimum. Differentiating equations 

F s (5,9) = and F g (6,0) = 

with respect to x gives 

T 5S {1 9) 6' + Fso(S, 9) 9' + T 5x {5, 9) = 
F 9S (6, 9) 5' + F e e{l 9) 0' + T ex {5, 9) = 0, 

where 5' and 9' denote the derivatives with respect to x. 

Since all partial derivatives are uniformly bounded by M and the deter- 
minant is » 4- J due to $T7j), we have that \5'\ < 2M and < 2M. Lastly, 
recall that ( = 1/a = 5/ cos 9, hence 



IC'I 



9' sin 9 ? 5' 
5 + 



cos 2 cos 9 
which proves the lemma with D = AM. 



< 4M, 



□ 
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